Robust Systems for Preposition Error Correction Using Wikipedia Revisions
نویسندگان
چکیده
We show that existing methods for training preposition error correction systems, whether using well-edited text or error-annotated corpora, do not generalize across very different test sets. We present a new, large errorannotated corpus and use it to train systems that generalize across three different test sets, each from a different domain and with different error characteristics. This new corpus is automatically extracted from Wikipedia revisions and contains over one million instances of preposition corrections.
منابع مشابه
An Explicit Feedback System for Preposition Errors based on Wikipedia Revisions
This paper presents a proof-of-concept tool for providing automated explicit feedback to language learners based on data mined from Wikipedia revisions. The tool takes a sentence with a grammatical error as input and displays a ranked list of corrections for that error along with evidence to support each correction choice. We use lexical and part-of-speech contexts, as well as query expansion w...
متن کاملThe UI System in the HOO 2012 Shared Task on Error Correction
We describe the University of Illinois (UI) system that participated in the Helping Our Own (HOO) 2012 shared task, which focuses on correcting preposition and determiner errors made by non-native English speakers. The task consisted of three metrics: Detection, Recognition, and Correction, and measured performance before and after additional revisions to the test data were made. Out of 14 team...
متن کاملUsing Parse Features for Preposition Selection and Error Detection
We evaluate the effect of adding parse features to a leading model of preposition usage. Results show a significant improvement in the preposition selection task on native speaker text and a modest increment in precision and recall in an ESL error detection task. Analysis of the parser output indicates that it is robust enough in the face of noisy non-native writing to extract useful information.
متن کاملNAIST at the HOO 2012 Shared Task
This paper describes the Nara Institute of Science and Technology (NAIST) error correction system in the Helping Our Own (HOO) 2012 Shared Task. Our system targets preposition and determiner errors with spelling correction as a pre-processing step. The result shows that spelling correction improves the Detection, Correction, and Recognition Fscores for preposition errors. With regard to preposi...
متن کاملMining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History
Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also prove interesting material for linguistic studies. In this article, we present WiCoPaCo (Wikipedia Correction and Paraphrase Corpus), a new freely-available resource built by automatically mining Wikipedia’s revision hi...
متن کامل